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Lock in Feedback in Sequential 
Experiments 

Maurits Kaptein and Davide Iannuzzi* 

Abstract: We often encounter situations in which an experimenter wants 
to find, by sequential experimentation, Xmax = argmax x /(#), where f(x) 
is a (possibly unknown) function of a well controllable variable x. Taking 
inspiration from physics and engineering, we have designed a new method 
to address this problem. In this paper, we first introduce the method in 
continuous time, and then present two algorithms for use in sequential ex¬ 
periments. Through a series of simulation studies, we show that the method 
is effective for finding maxima of unknown functions by experimentation, 
even when the maximum of the functions drifts or when the signal to noise 
ratio is low. 


1. Introduction 

When designing an experiment where a given parameter must be kept constant 
throughout the entire duration of the measurement, physicists and engineers 
often rely on feedback techniques that, in real time, can properly re-adjust the 
configuration of the experiment to compensate for unexpected drifts (Scofield, 
1994). Fig. 1 illustrates, for instance, a well-established approach that is used 
to maintain a variable x always locked at the value that maximizes the value 
of another variable j/, which is some function - possibly with large noise - of 
x. The algorithm behind this approach, which will be described more in depth 
later in the text, is based on the following steps: 

1 Fix a central value Xq of the variable x; 

2 Add an oscillation of amplitude A at a fixed angular frequency ui: x = 
xq + A x cos {uit). 

3 Measure the amplitude of the oscillations that the variable y has, in re¬ 
sponse of the oscillation of the variable x, at the same angular frequency w, 
and further measure whether the oscillation are in phase or out of phase; 

4 Set a new value of xo, adding (if the oscillation of y are in phase with the 
oscillation of x) or subtracting (if the oscillation of y are out of phase with 
respect to the oscillation of x) a value proportional to the value measured 
in step 4: Xo tUew — Xq ± 7 A y , where 7 is a constant. Iterate steps 2 to 4 
for the whole duration of the experiment. 

The above described feedback loop pushes the value of xo closer and closer 
to the value x max that maximizes y. As xq approaches x ma x, the oscillations 
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in y become smaller and smaller, moving Xq in a series of steps of decreasing 
size. Finally, when xo = x max , the variable y ceases to oscillate at frequency ur. 
because of this xq can stay locked in on x max . However, if the curve suddenly 
shifts to another position (e.g., if the relationship between x and y changes, a 
phenomenon referred to as concept drift (Gaber et al., 2005; Anagnostopoulos 
et al., 2012)), the u component of y becomes different from zero again, forcing 
the feedback loop to move the value of xq towards the new value of x max . 
Hence, the feedback loop enables one to hold on to the value of x sequentially 
that maximizes the value of y. 

Interestingly, the feedback loop described above can work well even if the 
variable y is affected by a high degree of noise. To extract the signal at frequency 
ui, in fact, one can make use of a commercial instrument called lock-in amplifier , 
which rejects all the components of the signals that do not beat at the frequency 
of interest. The algorithm used by a lock-in amplifier can of course be applied 
to digital (discrete timepoints) data as well. It is thus worth asking whether the 
approach adopted in a lock-in amplifier may be used in other contexts where, 
in the presence of a highly noisy set of data, one wants to maintain one variable 
locked to the value that maximizes the value of another. 



Fig 1. Illustration of the lock-in principle used in physics and engineering to maintain a 
bring and maintain an independent, controllable variable x onto the value Xmax for which a 
dependent variable y is maximized. The value of x is oscillated sinusoidally around a central 
value xo. (a): If x o < Xmax, y oscillates at the same frequency as x, in phase (i.e., a maximum 
value of x corresponds to a maximum value of y). (b): If xq > Xmax, y oscillates again at 
the same frequency as x, but with opposite phase (i.e., a maximum value of x corresponds to 
a minimum value of y). (c): If xq = Xmax, y ceases to oscillate at the frequency of x, but 
starts to oscillate at a doubled frequency. Lock-in amplifiers can detect the amplitude and the 
phase of the oscillation at a reference frequency, and, therefore, indicate whether x is smaller, 
larger, or equal to xq. 


Tantalized by this opportunity, we propose here to use lock-in feedback (LiF) 
algorithms for the optimization of the price in (e.g.,) a rebate action. The idea is 
to present each customer a different price, which is changed sinusoidally around 
a central value, causing the revenue to oscillate at the same frequency. As the 
customers take their purchasing decision, a lock-in algorithm monitors the os¬ 
cillations of the revenue at the price oscillation frequency. Like in the feedback 
loop described above, the central value of the price is continuously adjusted 
until the revenue ceases to oscillate at that frequency. At this point, in fact, the 
revenue is maximum (price elasticity = 1). If an unexpected event moves the 
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price elasticity curve, the algorithm will automatically push the central price 
towards the new maximizing value. 

Next to product pricing of rebate actions, many more examples could be 
conceived in the social sciences: 

• In economics, firms might be able to manipulate the price x of an offering 
and subsequently observe their revenue y. Here a firm seeks to find the 
value of x that maximizes y (for examples see Kung et al., 2002; Jiang 
et al., 2011). 

• In industry, the outcome y of a business process might depend on the 
amount of some raw material x used in the process. 

• In communication research, a communication professional might seek to 
find the length of an email message x that leads to the highest number of 
clicks y on a link in that message (Ansari and Mela, 2003). 

• In medicine, a physician seeks to find the optimal dose x of a medicine 
to maximize the health outcome y of her patients (see, e.g., Sapareto and 
Dewey, 1984; Marschner, 2007). 

• In education, scholars might seek to select learning tasks which are quan¬ 
tified by their difficulty x, that have the highest effect on learning y of 
their pupils. 

In the above cases the functional form of f{x) is often not known, the out¬ 
come y is observed with noise, and likely the treatment values that maximize 
the outcome are subject to concept drift (Gaber et al., 2005; Anagnostopoulos 
et al., 2012) (thus, they change over time). Here we present a method to find 
Xmax which does not require an explicit specification of f{x) or its derivatives, 
performs well in the face of noise, and is robust to concept drift. 

To prove the merits of LiF in such cases, we have performed an extensive 
numerical exercise that simulates the performance of LiF in a diverse range of 
situations, including ones where the observed signal is merely the choice of a 
consumer to yes or no adopt a product for a given (rebate) price; a scenario 
directly in line with the pricing challenges as identified above. We show that, in 
the presence of the noise induced by the variance of the willingness to pay across 
the population of the customers entering the shop, our lock-in algorithm allows 
the seller to both determine and maintain the price that optimizes the revenue of 
the shop. Furthermore, we demonstrate that if the price elasticity curve changes, 
the algorithm can detect the direction of the change and converge again to the 
optimal price. 

It has to be noted that it is a well-known and well-studied challenge to find 
optimal (according to some specified criterion) treatment values in (sequential) 
experiments. This challenge is acknowledged in many branches of science and 
engineering (see, e.g., Allen et al., 2003; Bardsley et al., 1996; Kuck et al., 2006). 
An often researched topic is that of design optimization (DO), in which exper¬ 
imental designs are identified that lead to the smallest possible variances in 
the estimated model parameters (Burnetas and Katehakis, 1996; McClelland, 
1997). More recently, an interest in adaptive design optimization (ADO) meth¬ 
ods (Myung and Pitt, 2009b; Myung et al., 2013) and sequential experimentation 
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methods has emerged: researchers are looking for effective ways to sequentially 
determine optimal treatment values in experiments as the experimental data is 
being collected (Zhang and Lee, 2010). Notably, work on Multi Armed Bandit 
(MAB) problems (e.g., Lai, 1987; Whittle, 1980; Scott, 2010; Bubeck et ah, 
2011a; Yue et al., 2012) and stochastic optimization (e.g., Agarwal et ah, 2011) 
has led to efficient sequential sampling schemes for various experimental designs 
and optimization criteria. 

This paper however introduces a novel sequential sampling scheme for a spe¬ 
cific sequential design problem: we examine the problem in which the treat¬ 
ment values are continuous (e.g., with x being € R) and the researcher seeks 
a treatment value x max at which the observed outcome y —which, at least in 
part, depends on x —obtains its maximal value. Thus we examine the situa¬ 
tion in which an experimenter wants to find, by sequential experimentation, 
Xmax = argmax 2 , f(x), where f(x) is a (possibly unknown) function of a well 
controllable variable x and is likely observed with noise. We focus on the simple 
case where a: is a scalar. In the remainder of the paper, we index sequential tri¬ 
als by t £ {1,... ,T}. Our ultimate aim is to describe an experimental method 
for manipulating xt (in discrete time) to find, sequentially, the value of x that 
maximizes y. 

The current manuscript is structured as follows: first, we briefly review the 
literature on DO, MAB problems, and stochastic optimization to position our 
method. Next, we discuss LiF as a solution to the treatment optimization prob¬ 
lem considered in this paper. LiF is based on a solution that is routinely im¬ 
plemented in physics and engineering applications which relies on the idea of 
systematically changing the value of the treatment in time via so-called lock-in 
amplifier techniques (Scofield, 1994). We introduce its basic principles in con¬ 
tinuous time. Subsequently, we present two algorithms to use LiF in sequential 
experiments. We then, by simulation, compare the two algorithms, and exam¬ 
ine the performance of LiF in several scenario’s of signal-to-noise ratio and in 
situations of concept drift. Furthermore, we examine the use of LiF in cases in 
which the observable outcome is discrete; which is for example the case in the 
optimization of prices as described above. Finally, we examine the empirical re¬ 
gret - the search cost of the algorithm compared to an algorithm which has full 
information - of the proposed procedure and compare it to a standard solution 
in the MAB literature (Berry and Fristedt, 1985). 

1.1. Treatment optimization methods 

The problem of finding x max is treated in a number of branches in the experi¬ 
mental design and machine learning literature. The problem can be approached 
as an optimal design problem, in which the main aim is to design an experiment 
that efficiently provides us with information regarding fix) (see, e.g., O’Brien 
and Funk, 2003; Myung and Pitt, 2009a). Often, in the DO literature, exper¬ 
iments are treated statically, and the functional form of the data generating 
function is assumed known: the remaining question is to determine the optimal 


imsart-generic ver. 2011/11/15 file: manuscript.tex date: January 13, 2016 


Kaptein & Iannuzzi/Lock in Feedback 


5 


treatments given a fixed size of the experiment and an assumed relationship to 
precisely estimate the parameters of interest. 

Recently, (Myung et al., 2013) introduced an advanced method of DO into 
the psychology literature called Adaptive Design Optimization (ADO). The aim 
of ADO is to create adaptive experiments which are optimized to distinguish 
between competing explanations of the data (Myung and Pitt, 2009b). However, 
in this literature the main aim is to find treatment values to efficiently estimate 
parameters given a number of model assumptions. Instead, our focus is on ef¬ 
ficiently finding treatment values which maximize some observable outcome of 
the experiment. 

Sequentially finding optimal treatments, where optimal is defined in terms 
of observed outcomes, is explicitly studied in the MAB literature (Berry and 
Fristedt, 1985). In this problem specification researchers consider policies V 
which describe how to select actions a £ A (the treatment values) at different 
times t where the aim is to maximize the cumulative reward R(t) = Y^t=i ri 
(Bubeck et al., 2011a). The reward is assumed to be a function, possibly with 
noise, of the actions. Many specifications of the MAB problem exists: researchers 
have considered independent treatments (the traditional /c-armed bandit prob¬ 
lem (Whittle, 1980)), related treatments, continuous treatments, etc. (Audibert 
et al., 2009; Bubeck et al., 2011b). The MAB problem, and its generalization, 
the contextual MAB problem (Li et al., 2010; Beygelzimer et al., 2011) present 
an active area of research in the machine learning literature. 

The literature on stochastic optimization with bandit feedback (Agarwal 
et al., 2011, 2010) considers the problem of finding the optimal value of con¬ 
tinuous treatments (Flaxman et al., 2005). Of special interest for the current 
proposal are derivative-free (or gradient-free) methods in which the gradient of 
the function (which is of use for e.g., (stochastic) gradient descent method) is 
assumed unknown and is itself approximated during the sequential experiment 
(Shamir, 2012). In this paper we present a derivative free method to perform 
stochastic optimization with bandit feedback. The presented method is well- 
suited for practical use in sequential experiments due to its ease of implemen¬ 
tation: in the current paper we provide several algorithms for performing the 
optimization in real-life settings. Before presenting our novel sequential approach 
to solving the continuous treatment optimization problem, we first introduce its 
theoretical background assuming that the treatment does not vary in discrete 
sequential steps, but rather can be varied continuously (in continuous time). 

2. Finding the maximum of a curve with a lock-in algorithm 

In this section we detail the basic principles behind LiF assuming continuous 
time in which x can be manipulated. Let’s assume that y is a continuous function 
/ of x: y = f(x). Let’s further assume that x oscillates with time according to: 

x(t) = Xo + A cos (ut) (1) 

where w is the angular frequency of the oscillation, xq its central value, and A 
its amplitude. For relatively small values of A, Taylor expanding f(x) around 
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Xq to the second order, one obtains: 


/ df 

y(x(t)) = f(x o) + (x 0 + A cos (ut) - x 0 ) I ^ 


+ ^ Oo + -4 cos (ut) - x 0 ) 2 | ^ 


X—Xq , 


which can be simplified to: 


y(x(t)) = k + A cos (ut) ^ ^ 
+ ^ 2 cos(2 


( 2 ) 


( 3 ) 


where k = f(x o) + 1/4 A 2 yd 2 f /dx 2 \ x _ xa j . It is thus evident that, for small os¬ 
cillations, y becomes the sum of three terms: a constant term, a term oscillating 
at angular frequency u, and a term oscillating at angular frequency 2u. 

Suppose we ourselves can actively manipulate x and measure y, and that / 
is continuous and only has one maximum and no minimum. 1 Further suppose 
that one is interested to find the value argmax^, y = fix) which we denote with 
Xmax i and that our measurements of y contain noise 


y(t) = f(x{t)) + e t 


( 4 ) 


where e denotes the noise and e ~ 7r() where 7r is some probability density 
function and E[e|x] = 0. 

Following the scheme used in physical lock-in amplifiers (see, e.g., Scofield, 
1994), we multiply the observed y variable by cos(wt). Using eq. 3 and eq. 4, 
one obtains: 


Vu(t) = cos(wt) 
1 


k + A cos (ut) I 

\ ox 


+ -A 2 cos (2 ut) 
4 


d 2 / 
dx 2 


( 5 ) 


where y u is the value of y after it has been multiplied by cos (ut). Eq. 5 can be 
written more compactly as: 



dj_ 

dx 


+ k u cos (ut) + k 2u i cos (2ut) 


+ k^ cos (3 ut) + e cos (ut) 


( 6 ) 


1 For simplicity of exposure we only consider these well-behaved functions in this paper. 
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where 


k^k + A 2 /8(d 2 f/dx\ =xo ) 
k 2ui = A/2( K d 2 f/dx\ =x ^ 
k 3u = A 2 /8(d 2 f/dx 2 \ x=xo ' s ). 


(7) 

( 8 ) 
(9) 


Integrating y u over a time T = , where N is a positive integer and T denotes 

the time needed to integrate N full oscillations, one obtains: 



( 10 ) 


Depending on the noise level, one can tailor the integration time, T, in such 
a way to reduce the second addendum of the right hand of eq. 10 to negligible 
levels, effectively averaging out the noise in the measurements. Under those cir¬ 
cumstances, j/* provides a direct measurement of the value of the first derivative 
of / at x = xq. 

The above method thus yields quantitative information regarding the first 
derivative of / at x = Xq, providing, in this way, a logical update strategy of 
xq: if y * < 0 , then Xq is larger than the value of x that maximizes /; likewise, if 
y* > 0, xq is smaller than the value of x that maximizes /. Thus, based on the 
oscillation observed in y u we are now able to move Xq closer to x = arg max^. /( x) 
using an update rule Xq := Xq + yy* where 7 quantifies the learn rate of the 
procedure. Hence, we can setup a feedback loop that allows us to keep Xq close 
to x m ax, even if /( x) changes over time. 

Note that, multiplying y by cos 2 cut and using a similar approach as the one 
described above to extract the amplitude of the oscillation of y at frequency 2 ui, 
one would be able to measure the second derivative of the function / at x = x 3 . 
This property can be useful when, for instance, f(x) is known to be an exact 
parabola to not only derive the direction of the step towards the maximum, but 
to work out the exact step size (see Appendix 9). 

3. Algorithm for LiF in discrete time 

In practical terms, measurements can never run in continuous mode. Therefore, 
we now present an algorithm for LiF in discrete time. To simplify notation, 
we will index sequential measurements by yt where t = 1 ,,t = T where 
T denotes the length—possibly infinite—of the experiment that is ran to find 
arg max x /( x). 

In discrete time we can use the same procedure as above in which we start 
with Xo, and for each sample oscillate around xq with a known frequency w and 
known amplitude A: 


Xt = Xq + A cos uit 


( 11 ) 
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which will result in measurements given by 

yt = f{xo + A cos ut) + e t (12) 

On the basis of the arguments reported above, we can now implement a 
feedback loop that iteratively adjusts the value of Xo until x reaches x max . 
After that, if the function / changes, the loop can follow the value of x to the 
new maximizing position and thus stay “locked”. The procedure is similar to 
that given in Equation 6 and 10, where we first multiply the outcome yt by 
cos(ut) and subsequently integrate out the noise term (summing in the discrete 
case). In the following sections we present two possible implementations for LiF 
in discrete time for use in sequential experiments. 


3.1. LiF-I: Batch updates of Xo 

Our first implementation of LiF (denoted LiF-I) is presented in Algorithm 1. In 
this implementation we summate observations yt , which we multiply by cos(oA), 
for a batch period of length T, after which we update Xq. Variable y ^ contains 
a running sum of y t cos ut over t that is used for the integration. 


Algorithm 1 Implementation of LiF-I for single variable maximization in data 
stream using a batch approach. 

Require: x 0 , A, T, 7 , = 0 

, . _ 27r 

OJ - rp 

for t = 1 ,..., T do 
xt = xo A cos cut 
yt = f{xo + A cos ut) +£t 
= y% + yt cos ut 
if ( t mod T = 0) then 

vZ = yE/ T 

x 0 = x 0 + 7 y* * 

y% = 0 

end if 
end for 


The tuning parameters for LiF-I, which should be set by the experimenter, 
are Xo, A , T, 7. Here below we describe some general criteria the choice may be 
based on: 

• It is advised to set Xq as close as possible to x max . The choice can only 
be based on the available information on /. The more accurate the infor¬ 
mation, the closer the initial Xo to x ma x , the faster the convergence of the 
loop to Xmax- 

• The amplitude A affects the costs of the search procedure, because a large 
A implies querying a large range of x values with (possibly) low resulting y 
values. However, A also influence the learning speed: a very small A leads 
to small updates steps, while a large value of A might lead to a value of 
72/* that “overshoots” x max . 
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• The integration time T affects the variability of the update of xq, with 
larger integration times leading to a smoother update but slower conver¬ 
gence. 

• The learn-rate 7 < 1 determines the step size at each update of Xq. This 
can be interpreted, and tuned, akin learn-rates in, for instance, stochastic 
gradient descent methods (Poggio et al., 2011). 

3.2. LiF-II: Continuous updates of x 0 

For some applications the batch updates of xq - as implied by the continuous 
time analysis and defined in Algorithm 1 - might not be feasible. Algorithm 2 
presents a modified version of LiF (denoted LiF-II) in which xq is updated every 
observation. LiF-II starts by filling up a buffer of length T which we denote by 
the vector y u = {NAi,...,NAt}, after which each observation leads to an 
update of xq- In the algorithm description the values yt-T, ■ ■ •, yt are stored in 
the vector y u . By defining the learn rate as ^ the tuning parameters in LiF-II 
are the same as those discussed for LiF-I. 


Algorithm 2 Implementation of LiF-II for single variable maximization using 
continuous updates. 

Require: xq, A, T, 7 , = {NAi, . .., NAt} 

. , _ 2 - 71 - 

<jJ - ~jT 

for t = 1,... ,1~ do 
Xt = XQ + A COS U)t 

Vt = f(x 0 + Acosuit) + e t 
Vw = push(y u,yt cos ut) 
if (t > T) then 

yZ = {J2v^)/ T 

x 0 = x 0 + 

end if 
end for 


4. Simulation study 1: Comparison of Batched and streaming LiF 
and examination of tuning parameters 

In this section we study, by simulation, the differences between LiF-I and LiF-II, 
and the effects of the tuning parameters A , T, and 7 in a situation in which 
y = f(x) is measured without noise. 

Figure 2 presents the performance of both LiF-I and LiF-II for data generated 
using 


f(x) = ~2{x-5f+e (13) 

where e ~ Af(0, 0) and obviously x max = 5. The figure displays the performance 
of LiF for T = 10000 using the following tuning parameter settings 

• Xq = —5. 
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Gamma=0.1, T=10 Gamma=0.1, T=100 Gamma=0.1, T=1000 
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-5 


Time in stream / number of datapoints 


Fig 2. Examination of the effect of LiF tuning parameters 7 and T for A=l. Displayed are 
the results for LiF-I (black solid line) and LiF-II (gray dotted line) 


• Te {10,100,100} 

• A = 1 

• 7 e {.01, .1, .5, .9} 

The rows of Figure 2 (top to bottom) present decreasing values of 7 , while the 
columns (left to right) present increasing values of T. We fix A = 1. Each panel 
presents the value of xq during the data stream as selected using LiF-I (black 
solid line) and LiF-II (gray dotted line). It is clear that LiF can “overshoot” the 
maximum for values of 7 that are too high (top two rows). This happens for both 
LiF-I and LiF-II, although LiF-I seems more robust. For small values of 7 the 
performance of the algorithms is very similar, and increases in the integration 
window T merely smooth the updating procedure. 

In Figure 3 the results are plotted for the same setup, but this time we vary 
A € {.1,1, 2, 10}, while we fix 7 = .1. Here it is clear that for large values of 
A LiF-I has a tendency to become unstable (see top rows), while the streaming 
LiF-II is much more robust for erroneous selection of A. Very small choices for 
the amplitude A lead to very slow updates of Xq in both cases. Again, increased 
in T merely smooth the process. The simulations give an impression of the 
importance of the tuning parameters Xq, A , T, 7 and their relationships. In the 
remainder of this paper we will focus on the evaluation - through simulation - 
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Fig 3. Examination of the effect of tuning parameters A and T for 7 = .1. Displayed are the 
results for LiF-I (black solid line) and LiF-II (gray dotted line) 


of the performance of LiF-II in cases of noise and concept drift. 

5. Simulation study 2: Effects of noise 


0 2000 4000 6000 8000 10000 


2000 4000 6000 8000 10000 



Fig 4. Examination of the effect of different levels of noise cr 2 E {10,100,1000,10000}. Note 
that LiF performs very well also in the presence of noise (see text for more details). 


To examine the impact of (measurement) noise on the performance of LiF- 
II we repeat the simulations as described in Simulation Study 1 using the 
data generating model described by Equation 13 with e ~ Af(0, cr 2 ) and a 2 £ 
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{10,100,1000,10000}. We choose tuning parameters: x 0 = —5, A = 1, T = 100, 
7 = .1. Contrary to the simulations presented in Section 4 we now repeat the 
procedure in = 100 times: Figure 4 presents the average .To over the 100 simu¬ 
lation runs as well as the 95% confidence bounds. From Figure 4 it is clear that 
LiF-II performs very well in the face of noise. 

6. Simulation study 3: Performance of LiF-II in cases of concept 
drift 

One of the advantages of Lock in Feedback as opposed to other methods of 
finding x max is the fact that LiF can also be used to find a maximum of a 
function in cases of concept drift (Gaber et al., 2005): even when f{x) changes 
over time, LiF provides a method to keep the value of the treatment x close to 

•Xmax • 

To illustrate this latter advantage of LiF-II we setup a simulation using the 
following data generating model: 

/(t, t) = —2 ((t — .0025t) — 5)^ T e (14) 

where the (t — .0025f) term ensures that during the stream running from t = 0 
to t = 10 4 = T the value of x max moves from 5 to 30. We choose To = —20 (note 
the different starting position compared to the previous simulations), A = 1, 
T = 100, 7 = .1 and a 2 — 10. We investigate the performance of LiF-II in this 
case of concept drift. 

Figure 5 presents in the top panel y = f(x,t) for distinct values of t £ 
{0,1000,..., 10000} in different shades of grey. The concept drift is illustrated 
by the different locations of the parabola. Superimposed in blue is the value of 
To as selected by LiF-II. In the bottom panel the value of To as a function of 
the length of the stream is presented. It is clear that LiF-II quickly finds x max 
and follows the maximum as it moves during the stream. 

7. Simulation study 4: Dichotomous observations 

In the introduction we described as a use case of our proposed method the 
optimization of sales prices to maximize the revenue. This specific case presents 
a novel problem since the dependent variable y, encoding the purchase decision 
of a customer after a price has been pitched is dichotomous, and the actual 
outcome of interest— if the firm aims to maximize its revenue—is a function of 
the observable and the manipulated variable r(t) = Ui x i- Since y^ £ {0,1}, 
the signal r(t) used as an optimization criteria contains a different type of noise; 
while the expected value Pr(y = 1|t) x t of an offer could be approximated, the 
data itself contains non-zero values only when the decision is made to purchase 
a product. 

To empirically examine the performance of LiF in such a setting we setup a 
simulation study in which we assume that the data generating model looks as 
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Fig 5. Illustration of LiF in the case of concept drift. As the true maximum shifts (top panel) 
LiF is able to follow the maximum and keep xq close to Xmax (bottom panel). 


follows: 



(15) 

(16) 


n = ytx t 


Intuitively the above specification indicates that the probability that a consumer 
chooses to buy a product decreases as the price, x, of the product increases, while 
the (expected) revenue is computed using the probability of a purchase given a 
specific price multiplied by that price. Given this setup the (expected) x max is 
approximately 8. 

Figure 6 shows the performance of LiF-II for two different starting values, 
£0 = 4 and x$ = 15, using the same set of tuning parameters as those used 
in Study 3 (t = 10 4 = T, A = 1, T = 100, 7 = .1). The only change in the 
algorithm compared to the earlier simulations is that r t = ytXi is integrated 
(summed) over instead of using the observed yt directly. Also in this case, LiF 
finds x max fairly quickly (in under 6000 iterations). 

It has to be noted that too high starting values, and thereby a very low 
Pr(y = l|io) might lead to a failure to find x max since LiF then get’s stuck in 
a local “maximum”: for very high values of x the revenue r will always be 0. 
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Fig 6 . Use of LiF to find the revenue maximizing sales-price for a firm: example of a setup 
in which the observed y E {0,1}. 


8. Simulation study 5: Empirical Regret 

The previous studies show that LiF is effective in finding the value of x max . 
However, the oscillation that is introduced clearly introduces search costs into 
the procedure: LiF continuously runs experiments with a certain amplitude in 
its variation in x to find x max . In the previous simulations these search costs 
have not been considered, and hence while these simulations demonstrate that 
LiF finds the value of x ma x > the previous simulation studies are uninformative 
regarding the costs of the procedure. To address this problem we run another 
simulation study in which we monitor the empirical regret 

t 

K(t) = ^(/( x max ) - ( f{x t )) (17) 

i —1 

of the procedure. Thus, we compare over time in the data stream how much “is 
lost” when using LiF as compared to always selecting the exact right value of 
x that maximizes the outcome if the data generating process would have been 
known. We use the exact setup as used in Simulation Study 4 (exact same data 
generating model and tuning parameter settings), but we increase T to 10 5 . 
Also, because of the noise and our interest in LiF as a general procedure, not 
merely in one specific attempt, we replicate the simulation M = 100 times. 

To give insight in the performance of LiF when examining the regret of the 
procedure, we contrast the use of LiF not only to selecting the optimal value, 
but also to two other sequential experimentation scheme’s: 

• e-first: in this approach we run a limited time (up ton= 1000) experiment 
in which we randomly sample values of x uniformly between 0 and 20. 
Subsequently, we fit a simple logistic regression modeling Pr(y = l|x) = 
C(j3 o + fi\x) where £() denotes the logit link (see also Equation 15), and 
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determine x e = argmax^. £(/3 0 + p\x)x. The remaining T — n observations 
in the stream are allocated to x e . 

• Bootstrap Thompson Sampling (BTS): In this sequential experimentation 
scheme we again fit a simple logistic regression to estimate Pr(y = l\x). 
We use Stochastic Gradient Descent to update the parameters of the model 
at each time point during the data stream. Furthermore, we maintain 
J = 100 models each using an online half-sampling bootstrap to perform 
bootstrap Thompson sampling (See for details of this sequential allocation 
scheme Kaptein and Eckles, 2014). This gives J different estimates of 
the model parameters {{/3q, ft{})- We then randomly uniformly select j' 
out of j = l,...,j = J and select treatment xus = arg max x £(/3 q + 
Pi x)x. This bootstrapped sampling scheme quantifies the uncertainty in 
the model estimates and uses this directly to balance exploration (querying 
new values for x to learn more about the data-generating model), and 
exploitation (selecting the value of x which one believes leads to the highest 
outcomes). 

Note that we choose random starting points of the parameter values for BTS 
that are relatively close to the true values, and that the functional form of the 
model that is used is the same as the true data generating model. Hence, this 
latter condition is expected to do very well on the current problem since it 
implements a lot of knowledge regarding the data-generating function that is 
not accessible to LiF. 

Figure 7 shows the performance of LiF-II in terms of average regret - 
compared to the e-first and BTS. It is clear that the e-first does not perform 
very well: logically, during the experimentation stage t = l,...,t = n this 
method incurs a large regret. However, since the probability that the true x max 
is found exactly in the experiment is smaller then 1, also after the experiment 
period (expected) linear regret is incurred. BTS performs much better in the 
long run: the regret is not linear but rather seems to be 0(y/Jt)), which is the 
proven minimal regret bound known for this problem (Agarwal et ah, 2011). 

Early on LiF performs very well on this problem; LiF is very efficient in finding 
Xmax- It is even more efficient than BTS for small t, despite the fact that in 
the current setup BTS is heavily favored by using the correct form of the data 
generating model, something which is in practice very unlikely. However, in the 
long run the regret of LiF is lineair in t. This latter fact is easily explained: due 
to the continuous oscillations of x by adding A cos ujt LiF keeps exploring the 
space and thus keeps incurring additional costs. Even if x max has been found, 
these search costs are linear with t. 

This simulation suggests that, in the bandit feedback case, LiF can be im¬ 
proved by gradually decreasing the amplitude of the oscillation: if A can be 
decreased as a function of (e.g.,) the approximated gradient as well as the cur¬ 
rent time in the stream, the exploration behavior of LiF can be systematically 
decreased over time in the stream. However, this would make LiF less sensitive 
to concept drift, which might in practice be infeasible. Hence, we currently re¬ 
gard the linear regret incurred by LiF as exploration costs necessary to ensure 


imsart-generic ver. 2011/11/15 file: manuscript.tex date: January 13, 2016 


Kaptein & Iannuzzi/Lock in Feedback 


16 


experiment 
LiF o 

Thompson o 



Time in stream 

Fig 7. Overview of the (mean) empirical regret of three possible sequential allocation schemes. 


its robustness in a changing environment. 

9. Discussion and Future work 

In this paper we presented Lock in Feedback as a method to find arg max x fix) 
through sequential experiments. The method is appealing since it a) does not 
require the functional form of fix) to be known to derive its maximum, b) 
performs well in situations in which measurements are obtained with large 
noise, and c) allows following the maximum of a function even if that func¬ 
tion changes over time. We have presented the basic mathematical arguments 
behind LiF, demonstrating how known (or imposed) oscillations in x can be 
used to determine the derivative(s) of /( x) which can subsequently be used to 
find argmaxa, f{x). Next, we detailed two possible implementations of LiF and 
examined their performance for a variety of tuning parameter settings. We then 
showed that a streaming version of LiF is robust both to noise as well as concept 
drift. 

We believe LiF can be of use in many sequential experimentation problems in 
which the independent variable is continuous; in the introduction we discussed 
pricing, medication dosing, and the selection of items by their difficulty as pos¬ 
sible examples. LiF is extremely easy to implement, and very robust to noise 
and concept drift. We thus hope that LiF can be a valuable tool in treatment 
optimization in sequential experiments. 

However, the current expose of LiF also introduces a number of questions. For 
example, the ability to use LiF for problems of higher dimensions, e.g., where y = 
f{x) is a function of multiple variables, has not been explored here even though 
this extension relatively is easily made. Also, the suggested decrease of the 
amplitude in the bandit setting (Simulation Study 5) needs further scrutiny and 
begs for an analytical treatment of the use of LiF in stochastic optimization with 
bandit feedback (see, e.g., Agarwal et al., 2011). Finally, the currently proposed 
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version of LiF allows one to find local maxima (or minima), but convergence 
to a global maximum is not guaranteed. Throughout this paper, we have been 
considering unimodal functions, which might, in practical applications, be a too 
stringent assumption. 

In this paper we have demonstrated the use of LiF only in cases where x 
is scalar. However, when a; is a vector a very similar approach can be used to 
find the maximum of the function f(x) in more than one dimension. In the 
two dimensional case LiF can be extended by oscillating both elements of x at 
different frequencies: 

Xi t = + A\ cosuqt 

X2,t = X 2 ,0 + A-2 COS UJ 2 t 

After oscillating both elements of x we observe yt = /(aq^, # 2 ,t) and we can 
obtain information regarding the gradient by separately computing: 

Ui,u = Vt cos wit 
J/ 2 .W = Vt cos ui 2 t 

This simple extension allows for the use of LiF in higher dimensions. However, 
besides the fact that uq and uj 2 should not be multiples of each other, the effects 
of the tuning parameters and the performance of this higher dimensional version 
of LiF need to be further examined. 

Our proposed LiF algorithm, similar to many other procedures for function 
maximization, is prone to uncovering local maxima instead of global maxima. A 
logical solution to this problem would be to consider multiple starting points Xq 
which are oscillated independently (possibly alternating within a data stream). 
Effectively this would allow the experimenter to find multiple maxima. By eval¬ 
uating the value of y one could decide on the best possible solution, or, one 
could pool the results of multiple alternating threats to update each of them. 
Such approaches, and their robustness to the existence of local maxima, needs 
further scrutiny. 
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Algorithm for finding the exact maximum of a parabola using the 
second order approximation. 

Let’s suppose that the curve y = f(x) is a parabola: 


y = -a{x - x 0 ) 2 +7 


Clearly, f(x) has a maximum for x = Xq. Furthermore, the second derivative is 
always equal to —2a, regardless the value of x. Interestingly, the value of a can 
be easily extracted from the data accumulated during the lock-in procedure. 
For this purpose, y(t) has to be multiplied by cos(2wt). Following the steps 
illustrated in eq. 5, eq. 6 , and eq. 10, one obtains: 



which allows us to calculate a as: 



4y 2 , 
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